In [27]: runfile('/Users/jiaxichen/Desktop/Final Project for Data Science/Master/main_classification.py', wdir='/Users/jiaxichen/Desktop/Final Project for Data Science/Master')
Reloaded modules: preprocessing, functions
None
=======================================================================
Support Vector Machine-----------------------------------------------------------------------
=======================================================================
SVM Classifier accuracy for TFIDF: 65.500%
cross-validation accuracy scores TFIDF SVM: [0.67 0.67 0.69875 0.65875 0.6525 0.6575 0.695 0.675 0.6775
0.67375]
cross-validation accuracy: 0.673 +/- 0.014
Learning Curve for SVM TFIDF
Accuracy Plot fot SVM TFIDF
------------Error Evaluation for SVM-------------
Error Evaluation for SVM TFIDF
precision recall f1-score support
0:italian 0.70 0.69 0.70 400
1:mexican 0.44 0.57 0.50 400
2:southern_us 0.76 0.72 0.74 400
3:indian 0.67 0.73 0.70 400
4:chinese 0.68 0.68 0.68 400
5:french 0.36 0.53 0.43 400
6:cajun_creole 0.77 0.74 0.75 400
7:thai 0.75 0.80 0.78 400
8:japanese 0.63 0.60 0.61 400
9:greek 0.68 0.64 0.66 400
10:spanish 0.85 0.76 0.80 400
11:korean 0.79 0.62 0.70 400
12:vietnamese 0.84 0.81 0.83 400
13:moroccan 0.80 0.78 0.79 400
14:british 0.84 0.80 0.82 400
15:filipino 0.60 0.62 0.61 400
16:irish 0.58 0.56 0.57 400
17:jamaican 0.57 0.55 0.56 400
18:russian 0.73 0.65 0.68 400
19:brazilian 0.68 0.60 0.64 400
micro avg 0.67 0.67 0.67 8000
macro avg 0.69 0.67 0.68 8000
weighted avg 0.69 0.67 0.68 8000
Graphs - SVM TFIDF
Confusion matrix, without normalization
Normalized confusion matrix
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
=======================================================================
k-nearest neighbors--------------------------------------------------------------------
=======================================================================
KNN Classifier accuracy for TFIDF: 61.750%
cross-validation accuracy scores TFIDF KNN: [0.62125 0.6175 0.6325 0.60375 0.59125 0.61625 0.64875 0.6075 0.61875
0.6175 ]
cross-validation accuracy: 0.618 +/- 0.015
Learning Curve for KNN TFIDF
Accuracy Plot fot KNN TFIDF
------------Error Evaluation for KNN-------------
Error Evaluation for KNN TFIDF
precision recall f1-score support
0:italian 0.50 0.69 0.58 400
1:mexican 0.44 0.56 0.50 400
2:southern_us 0.64 0.71 0.67 400
3:indian 0.61 0.67 0.64 400
4:chinese 0.56 0.61 0.58 400
5:french 0.37 0.43 0.40 400
6:cajun_creole 0.65 0.74 0.70 400
7:thai 0.74 0.75 0.74 400
8:japanese 0.47 0.59 0.52 400
9:greek 0.67 0.56 0.61 400
10:spanish 0.73 0.73 0.73 400
11:korean 0.75 0.61 0.67 400
12:vietnamese 0.69 0.81 0.75 400
13:moroccan 0.78 0.76 0.77 400
14:british 0.73 0.73 0.73 400
15:filipino 0.61 0.51 0.55 400
16:irish 0.57 0.39 0.47 400
17:jamaican 0.63 0.37 0.47 400
18:russian 0.72 0.59 0.65 400
19:brazilian 0.69 0.54 0.61 400
micro avg 0.62 0.62 0.62 8000
macro avg 0.63 0.62 0.62 8000
weighted avg 0.63 0.62 0.62 8000
Graphs - KNN TFIDF
Confusion matrix, without normalization
Normalized confusion matrix
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
=======================================================================
Decision Tree--------------------------------------------------------------------
=======================================================================
Decision Tree Classifier accuracy for TFIDF: 43.938%
cross-validation accuracy scores TFIDF Decision Tree: [0.44625 0.41 0.425 0.46 0.42125 0.43 0.43125 0.4225 0.44
0.44625]
cross-validation accuracy: 0.433 +/- 0.014
Learning Curve for Decision Tree TFIDF
Accuracy Plot fot Decision Tree TFIDF
------------Error Evaluation for Decision Tree-------------
Error Evaluation for Decision Tree TFIDF
precision recall f1-score support
0:italian 0.53 0.49 0.51 400
1:mexican 0.28 0.31 0.30 400
2:southern_us 0.56 0.55 0.55 400
3:indian 0.43 0.45 0.44 400
4:chinese 0.39 0.41 0.40 400
5:french 0.23 0.30 0.26 400
6:cajun_creole 0.58 0.55 0.56 400
7:thai 0.57 0.54 0.56 400
8:japanese 0.35 0.37 0.36 400
9:greek 0.38 0.39 0.38 400
10:spanish 0.52 0.48 0.50 400
11:korean 0.45 0.44 0.45 400
12:vietnamese 0.56 0.53 0.54 400
13:moroccan 0.51 0.50 0.51 400
14:british 0.55 0.57 0.56 400
15:filipino 0.35 0.38 0.36 400
16:irish 0.32 0.30 0.31 400
17:jamaican 0.29 0.28 0.28 400
18:russian 0.53 0.45 0.49 400
19:brazilian 0.46 0.42 0.44 400
micro avg 0.43 0.43 0.43 8000
macro avg 0.44 0.43 0.44 8000
weighted avg 0.44 0.43 0.44 8000
Graphs - Decision Tree TFIDF
Confusion matrix, without normalization
Normalized confusion matrix
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
=======================================================================
Random Forest--------------------------------------------------------------------
=======================================================================
Random Forest Classifier accuracy for TFIDF: 41.188%
cross-validation accuracy scores TFIDF Random Forest: [0.42625 0.405 0.44875 0.40875 0.405 0.39125 0.43375 0.39625 0.4275
0.4575 ]
cross-validation accuracy: 0.420 +/- 0.021
Learning Curve for Random Forest TFIDF
Accuracy Plot fot Random Forest TFIDF
------------Error Evaluation for Random Forest-------------
Error Evaluation for Random Forest TFIDF
precision recall f1-score support
0:italian 0.36 0.30 0.33 400
1:mexican 0.13 0.49 0.21 400
2:southern_us 0.54 0.56 0.55 400
3:indian 0.48 0.50 0.49 400
4:chinese 0.49 0.22 0.30 400
5:french 0.14 0.08 0.10 400
6:cajun_creole 0.51 0.56 0.53 400
7:thai 0.62 0.62 0.62 400
8:japanese 0.29 0.25 0.27 400
9:greek 0.55 0.26 0.35 400
10:spanish 0.50 0.49 0.50 400
11:korean 0.80 0.42 0.55 400
12:vietnamese 0.52 0.73 0.61 400
13:moroccan 0.61 0.55 0.58 400
14:british 0.44 0.65 0.53 400
15:filipino 0.51 0.33 0.40 400
16:irish 0.47 0.14 0.22 400
17:jamaican 0.38 0.28 0.32 400
18:russian 0.48 0.48 0.48 400
19:brazilian 0.51 0.48 0.50 400
micro avg 0.42 0.42 0.42 8000
macro avg 0.47 0.42 0.42 8000
weighted avg 0.47 0.42 0.42 8000
Graphs - Random Forest TFIDF
Confusion matrix, without normalization
Normalized confusion matrix
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
Please enter a comma seperated ingredients: salt, water, milk, oil
This is the prediction value ['japanese']
In [28]: runfile('/Users/jiaxichen/Desktop/Final Project for Data Science/Master/main_clustering.py', wdir='/Users/jiaxichen/Desktop/Final Project for Data Science/Master')
Reloaded modules: preprocessing, functions
None
Traceback (most recent call last):
File "<ipython-input-28-fa0a2050625e>", line 1, in <module>
runfile('/Users/jiaxichen/Desktop/Final Project for Data Science/Master/main_clustering.py', wdir='/Users/jiaxichen/Desktop/Final Project for Data Science/Master')
File "/Users/jiaxichen/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 704, in runfile
execfile(filename, namespace)
File "/Users/jiaxichen/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Users/jiaxichen/Desktop/Final Project for Data Science/Master/main_clustering.py", line 56, in <module>
complete_vector, x_train, x_test, y_train, y_test, df_sample = classify_preprocessing(numCuisine)
ValueError: too many values to unpack (expected 6)
In [29]:
In [29]: runfile('/Users/jiaxichen/Desktop/Final Project for Data Science/Master/main_clustering.py', wdir='/Users/jiaxichen/Desktop/Final Project for Data Science/Master')
Reloaded modules: preprocessing, functions
None
List of countries: ['Greece', 'Texas', 'Philippines', 'India', 'Jamaica', 'Spain', 'Italy', 'Mexico', 'China', 'Britian', 'Thailand', 'Vietnam', 'United States', 'Brazil', 'France', 'Japan', 'Ireland', 'Korea', 'Morocco', 'Russia']
Please select your nationality:China
cuisine ... ingredients_clean_string
1646 chinese ... fresh ginger , rolls , white vinegar , dry she...
1708 chinese ... spices , spinach leaves , long-grain rice , wa...
1679 chinese ... red chili peppers , sliced shallots , fish sau...
1684 chinese ... sugar , sesame oil , scallions , black vinegar...
1752 chinese ... ground ginger , reduced sodium soy sauce , bab...
1885 chinese ... soy sauce , sesame oil , carrots , lettuce , c...
1874 chinese ... jumbo shrimp , fresh ginger , bell pepper , ch...
1995 chinese ... jasmine rice , garlic , carrots , shallots , s...
1691 chinese ... soy sauce , scallions , garlic , pork loin cho...
1987 chinese ... water , sesame oil , rice vinegar , hoisin sau...
[10 rows x 4 columns]
------------------------------------
Here comes the good stuff
/Users/jiaxichen/anaconda3/lib/python3.7/site-packages/matplotlib/figure.py:98: MatplotlibDeprecationWarning:
Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
Enter the id of the reciepe you like: 1684
This is the closest cuisine to your selection: korean
recipe_id ... similarity_score
0 19747 ... 24.899
1 38856 ... 25.353
2 42964 ... 38.275
[3 rows x 4 columns]
This is the Kappa Score: -0.03128381278403536
This is the Silhouette score: 0.008061962380614962
This is the Rand index: 0.05922046692167975
Homogeneity Score: 0.20900771084265996
Completeness Score : 0.22929528779329858
V_measure: 0.2186819773436727
Adjusted Random Score: 0.05922046692167975
Adjusted Mutual Info Score: 0.2013478993524601
/Users/jiaxichen/anaconda3/lib/python3.7/site-packages/sklearn/metrics/cluster/supervised.py:732: FutureWarning:
The behavior of AMI will change in version 0.22. To match the behavior of 'v_measure_score', AMI will use average_method='arithmetic' by default.
In [30]: